The Role of Models in Predictive Validation

نویسنده

  • John Maindonald
چکیده

Model choice and validation have a central role in data analysis, including predictive modeling. While standard diagnostics can help identify model inadequacies, it is natural to use predictive accuracy as the decisive criterion in the final choice of predictive model. A key point is that any assessment of predictive accuracy, theoretical or empirical, inevitably assumes a “data mechanism”, i.e., a sampling or other stochastic model that relates model predictions to the population that is the target for predictions. In a controversial paper Breiman [3] presents predictive accuracy as an obvious and natural criterion for model assessment. He criticises a statistical culture that has almost exclusively used models that assume a stochastic “data mechanism” that is thought to describe underlying scientific processes. Breiman argues for the wider use of algorithmic models, such as tree-based regression and neural nets, that “treat the data mechanism as unknown”. Disregard of data mechanisms has limits. Simple approaches to predictive validation assume, in effect, that the data are a random sample from the population to which predictions will be applied. This is often inappropriate. Cox [3, following comment] notes that predictions are often applied “under quite different conditions from the data”. Indeed, the conditions may be so different that a realistic assessment of predictive accuracy is impossible! In what follows I will (1) comment briefly on algorithmic models; (2) note that the interest is, in some contexts, in model parameters; (3) comment in more detail on predictive accuracy; (4) discuss implications for data mining and for the use of data bases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

طراحی شبکه عصبی مصنوعی برای پیش‌بینی توأم سندرم متابولیک و شاخص مقاومت به انسولین (HOMA-IR): مطالعه قند و لیپید تهران

  Background & Objective: Mixed outcomes arise when, in a multivariate model, response variables measured on different scales such as binary and continuous. In a bivariate modeling, when there are mixed response variables, the common methods in classic statistics have shortcomings. This study aimed at designing an appropriate ANN model for modeling and predicting the bivariate mixed responses i...

متن کامل

Validation and application of empirical shear wave velocity models based on standard penetration test

Shear wave velocity is a basic engineering tool required to define dynamic properties of soils. In many instances it may be preferable to determine Vs indirectly by common in-situ tests, such as the Standard Penetration Test. Many empirical correlations based on the Standard Penetration Test are broadly classified as regression techniques. However, no rigorous procedure has been published for c...

متن کامل

Logic regression and its application in predicting diseases

Regression is one of the most important statistical tools in data analysis and study of the relationship between predictive variables and the response variable. in most issues, regression models and decision tress only can show the main effects of predictor variables on the response and considering interactions between variables does not exceed of two way and ultimately three-way, due to co...

متن کامل

Quantitative Structure Activity Relationship Analysis of Coumarins as Free Radical Scavengers by Genetic Function Algorithm

The antioxidant properties of coumarin derivatives using the 2,2ˈ -diphenyl-1- picrylhydrazyl (DPPH) radical scavenging assay were investigated by the application of Quantitative Structure Activity Relationship (QSAR) studies. The molecular structures were optimized and submitted for the generation of quantum chemical and molecular descriptors. Genetic Function Algorithm (GFA) was employed in m...

متن کامل

QSAR models to predict physico-chemical Properties of some barbiturate derivatives using molecular descriptors and genetic algorithm- multiple linear regressions

In this study the relationship between choosing appropriate descriptors by genetic algorithm to the Polarizability (POL), Molar Refractivity (MR) and Octanol/water Partition Coefficient (LogP) of barbiturates is studied. The chemical structures of the molecules were optimized using ab initio 6-31G basis set method and Polak-Ribiere algorithm with conjugated gradient within HyperChem 8.0 environ...

متن کامل

A comparative QSAR study of aryl-substituted isobenzofuran-1(3H)-ones inhibitors

A comparative workflow, including linear and non-linear QSAR models, was carried out to evaluate the predictive accuracy of models and predict the inhibition activity of a series of aryl-substituted isobenzofuran-1(3H)-ones. The data set consisted of 34 compounds was classified into the training and test sets, randomly. Molecular descriptors were selected using the genetic algorithm (GA) as a f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003